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A BAYES METHOD FOR A MONOTONE HAZARD 
RATE VIA S-PATHS 1 

By Man-Wai Ho 

National University of Singapore 

A class of random hazard rates, which is defined as a mixture of 
an indicator kernel convolved with a completely random measure, is 
of interest. We provide an explicit characterization of the posterior 
distribution of this mixture hazard rate model via a finite mixture 
of S-paths. A closed and tractable Bayes estimator for the hazard 
rate is derived to be a finite sum over S-paths. The path characteri- 
zation or the estimator is proved to be a Rao-Blackwellization of an 
existing partition characterization or partition-sum estimator. This 
accentuates the importance of S-paths in Bayesian modeling of mono- 
tone hazard rates. An efficient Markov chain Monte Carlo (MCMC) 
method is proposed to approximate this class of estimates. It is shown 
that S-path characterization also exists in modeling with covariates 
by a proportional hazard model, and the proposed algorithm again 
applies. Numerical results of the method are given to demonstrate its 
practicality and effectiveness. 

1. Introduction. In reliability theory and survival analysis a hazard rate 
A(t) is interpreted as the propensity of failure of a system (or an item) in 
the instant future given that it has survived until time t. In general, the 
function has a wide variety of shapes. The simplest case of a constant haz- 
ard rate corresponds to an exponential lifetime distribution for the system. 
Cases of increasing or decreasing hazard rate, broadly speaking, correspond 
to lifetime distributions that are of a lighter or heavier tail, respectively, 
compared to an exponential distribution. There is a substantial amount of 
literature about estimation of monotone hazard rates from a frequentist 
viewpoint. They include, for example, the pioneering work of Grenander 
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[17] and Prakasa Rao [47], extensions of their work to different censoring 
schemes by Padgett and Wei [46] and Mykytyn and Santner [45], a con- 
strained spline smoothing technique by Villalobos and Wahba [49], work 
of Lo and Phadia [42] and Huang and Wellner [26] based on the least con- 
cave/greatest convex minorants, and a kernel-based method by Hall, Huang, 
Gifford and Gijbels [19]. 

A Bayesian nonparametric approach to this important problem is to use 
the fact that a monotone hazard rate on the half line 1Z = (0, oo) may be 
written in the form 

(1) A(t|/i)= / I(t<u)n(du), 

Jn 

where 1(^4) is the indicator function of a set A and \i is modeled as a random 
process. Dragichi and Ramamoorthi [13] establish the strong and weak con- 
sistency of the posterior distribution of these hazard rates for various choices 
of fj>. The consistency of the Bayes estimate follows as a consequence. This is 
important as it shows that such models yield viable estimators and, hence, 
alternatives to the approaches mentioned earlier. This approach was first 
utilized by Dykstra and Laud [14], wherein \x is modeled as an extended or 
weighted gamma process. Lo and Weng [41], specifying a weighted gamma 
process [38] for ^ and replacing the indicator function in (1) by more general 
kernels k(t\u), provide explicit posterior analysis for hazard rates with more 
general shapes, that is, 

(2) A(t|yu) = / k(t\u)n(du). 

Jn 

Their analysis, paralleling that of Lo [39], shows that, for general kernels, 
the posterior distribution can be characterized in terms of random par- 
titions, say, p = {Ci, . . . , C n ( p )}, of the integers {l,...,n} related to the 
Chinese restaurant process. Here for exchangeable observations X\, . . . ,X n , 
Cj = {i : Xi = X*}, for n(p) < n unique values X{,..., X*,-> . This is for- 
mally obtained from a posterior distribution which is a mixture of the Polya 
urn distribution [7] . See [30] for a recent discussion of these models relative 
to the Dirichlet and gamma processes in a semi-parametric context. It is now 
recognized in Bayesian statistics and spatial statistics that other model spec- 
ifications for \x may be of interest. Such generalizations of (2), also known 
as Levy moving averages, are discussed in, for instance, [50]. James ([32], 
Section 4), in analogy to Lo and Weng [41], provides a partition-based rep- 
resentation of the posterior distribution for the case where [i is a completely 
random measure [36, 37] or related model. 

While models in (1) are special cases of (2) and, hence, they have a pos- 
terior distribution expressible in terms of random partitions, Dykstra and 
Laud [14] and Lo and Weng [41] showed that these models actually have a 
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considerably less complex representation in terms of what are called S-paths. 
Quite specifically, S-paths are combinatorial structures which, relative to p, 
only contain information about the maximal element and size of each cell C,- . 
This phenomenon is discussed in more detail in the case of monotone den- 
sities in [10]. In this work we note the fact that the occurrence of tractable 
S-paths for monotone hazard rates is due to the nice features of the indicator 
kernel. Hence, using this fact, we are able to refine the partition-based re- 
sults of James [32] to show that all such monotone hazard rates have S-path 
structures. This represents the first explicit characterization of this type. 
The main attractive feature is that the space of S-paths is considerably 
smaller than the space of partitions (see Table 2 in the Appendix). Hence, 
it has been recognized that if one could efficiently sample S-paths in this 
context, this would lead to more parsimonious methods for inference. How- 
ever, it turns out that the design and implementation of efficient numerical 
methods utilizing S-paths are not that obvious. Section 3 presents an effi- 
cient MCMC method for sampling directly S-paths induced by monotone 
hazard rates. We shall also extend this to the semi-parametric setting of a 
proportional hazard model with covariates in Section 6. 



2. A posterior distribution of a monotone hazard rate model via S-paths. 

This section concerns Bayes estimation of a decreasing hazard rate on the 
half line TZ, defined by (1), wherein p is taken to be a completely random 
measure without drift on TZ. The law of p is uniquely characterized by the 
Laplace functional 



(3) Cfj,(g\p,r])=exp 



f (i _ e - 9{u)z )p(dz\u)r](du) 



where g is a nonnegative function on TZ. We say that p is characterized by an 
intensity measure p(dz\u)rj(du), as it can be represented in a distributional 
sense as 

p(du) = / zAf(dz,du), 
Jn 

where M{dz,du) is a Poisson random measure, taking on points (z,u) in 
TZ x TZ, with mean intensity 

(4) E[J\f(dz, du)] = p{dz\u)ri{du), 

such that, for any bounded set B on the half line, J B J^mm(z, \)p(dz\u) x 
rj(du) < oo. 

Suppose we collect observations T = (T±, . . . , Tjv) from N items with haz- 
ard rates given by (1) until time r, so that T\ < ■ ■ ■ < T n < t denote the 
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completely observed failure times, and T n+ \ 
censored times. Define 



T/v = r are the right- 



(5) 



9n(u) 



N 

i=l 
-JV 



I(t < u) dt, 



and n(g N ) = f n g N (u)n(du) = / T Ef=i > t)]X(t\fi) dt, where E^i 1(7* > 
t) is called the total time transform [4]. Assuming a multiplicative intensity 
model [1, 2], the likelihood of the data is 



(6) 



AH 



(N — n)\ 



[ / I(Ti < Ui )n(d Ui ) 



exp[-fj,(g N )]- 



(See Remark 2.2.) Noticing that this is a special case of the Levy mov- 
ing averages model considered in [32] with a general kernel replaced by the 
indicator function, we can describe the posterior distribution in terms of par- 
titions of n integers (given in the Appendix). Due to the special structure of 
the indicator kernel [see (24)], we recognize from the posterior distribution 
that the information carried by a partition about the remaining members 
other than the maximal element in any cell is irrelevant. In other words, 
only the maximal element and the size of each cell is sufficient for this prob- 
lem. To summarize the information, we can define an integer-valued vector 
S = (So, Si, . . . , S n -i,S n ) (see [10, 14]), referred to as an S-path (of n + 1 co- 
ordinates), which satisfies (i) Sq = and S n = n; (ii) Sj < j, j = 1, . . . , n — 1; 



and (in) Sj < S j+1 , j = 1, 



,n ■ 



1. A path S is said to correspond to one or 



many partitions p, provided that (i) labels of the maximal elements of the 
n(p) cells in p coincide with locations j at which Sj > Sj—i, and (ii) the size 



of the cell Cj with a maximal element j is identical to rrij 
for all i = 1, . . . ,n(p) (see [10] and [22] for more discussion). 
Define fjy(z,u) =g^}(u)z. Given the data T, assume that 



S j s 



(7) 



Ki(e * N p\u) 



z e 



- 9N ^ z p(dz\u) < 00, 



for any integer i < n and a fixed u > 0. Write J2 S as summing over all paths 



S of the same number of coordinates, and Yl{j :mj> o} an d J2{j : 
n?=i:m,->o and E?=i :m ,>o conditioned on S, respectively. 



rrij>0} 



as 



Theorem 2.1. Suppose the likelihood of the data is (6), and n is a com- 
pletely random measure characterized by the Laplace functional (3). Then 
given the data T, the posterior law of fx can be described by a three-step 
experiment: 
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(i) An S-path S = (0, S±, . . . , 5 n _i, n) has a distribution Z(S) = 0(S)/ 
Es<K S )> where 

(8) *(S)= J] r^(e~ fN p\yHdy). 

{i:m J >0} V J 3 ' • 

(ii) Given S, i/iere exist Sj=i > Sj'-l) independent pairs of(yj,Qj), 
denoted by (y, Q) = {(yj,Qj) :mj > 0,j = l,...,n}, where yj\S,T is dis- 
tributed as 

(9) Vj(d yj \S, T) cx 1(2) < yj )K mj (e~^ p\yj)v(d yj ) 
and 

(10) Pr{Qj G dz|S, yi ,T} a e' 9 ^' p{dz\yj). 

(iii) Given (S,y, Q), /i /ias f/ie distribution of p* N + Y^,{j:m >o}Qj^yj> 
where p* N is a completely random measure characterized by e~ 9N ^ z p{dz\u)ri{du). 

Remark 2.1. The finiteness condition in (7) guarantees the existence 
of the posterior distributions of Qj\S, yj in statement of (ii) in Theorem 2.1. 

Corollary 2.1. Theorem 2.1 implies that the posterior mean of the 
decreasing hazard rate (1) given T is given by, for t € [0, r], 

;>oo n 

(11) E[A(t|u)|T] = / Kl (e-f"p\ y ) V (dy)+J2z(S)Y,mS), 

* s j=l 

where Ki{e~^ N p\y), i = 1, . . . , n, is defined in (7), Z(S) is gii>en in Theorem 
2.1 and 

(12) ^-^'YfT,^ 

i/mj > 0, otherwise 0. 

With the posterior consistency result of Dragichi and Ramamoorthi [13], 
the consistency of this Bayes estimate can be obtained via the same ar- 
gument used in Corollary 1 of [6]. In addition, this path-sum estimator is 
always less variable than its counterpart in terms of partitions due to the 
following Rao-Blackwellization result, which states that p|S,T is uniformly 
distributed over all partitions that correspond to the given path S. (See the 
proof of Lemma 2.1 in [10] for this total number.) 
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Lemma 2.1. Suppose SjT~Z(S). Then there exists a conditional dis- 
tribution 

(13) 7r(p|S,T) = - ' , ., . , peC s , 

ll{i:rry>0}l j-Sj ) 

where Cs is i/te collection of all partitions that correspond to path S. 

Remark 2.2. As discussed in [32], Section 4 (see also [3], Section III. 2), 
the multiplicative intensity model captures a large variety of models that ap- 
pear in event history analysis. Bayesian analysis for models under different 
censoring schemes, such as left truncation together with right censorship, 
and random censoring at different time points, follows similarly as the like- 
lihoods differ slightly from (6). 

Remark 2.3. One may model a "bathtub" or [/-shaped hazard rate 
with a minimum at a by [41] 

X(t\a,fj l ) = J ' I(\t — a\ > u) fi(du) , 

and obtain a posterior distribution characterized by S-paths. In particular, 
the posterior law of (a, fx) can then be jointly described by the posterior laws 
of a and //[a, where the latter follows naturally as a path characterization 
for a fixed a; see [10] and [23] for estimation in similar mixture models. 

3. The Markov chain Monte Carlo method. This section introduces an 
MCMC path-sampler to efficiently compute the hazard estimate (11) (and 
in general sums over S-paths that appear in many other aforementioned 
problems). The algorithm samples a Markov chain with a unique stationary 
distribution Z(S) in a state space as the collection of S-paths of n + 1 
coordinates. It is named an accelerated path (AP) sampler as it is designed 
to accelerate a straightforward Gibbs sampler [16] in the sense that the 
algorithm allows more efficient movements among different S-paths. 

A straightforward Gibbs sampler, which has a stationary distribution pro- 
portional to (p(S) [see (8)], can be defined [22]: Each Gibbs cycle consists of 
sampling SV|S-r, where S_ r = (Si, . . . , S r -i, S r +\, . . . , S n -i) is the "deleted- 
r" vector, and cycling through r = 1, . . . , n — 1. The conditional probabilities 
are 

Pr{5 r = j|S_ r } oc <f>(0,Si,.. ■ ,S r -i,j,S r+ i, . . . ,S n -i,n), 

for j = S r -i, S r ~i + 1, . . . , SV+i — 1, <5 r +i (subject to the definition of an S- 
path). At step r, S r would remain unchanged if 5 r _i = 5 r +i. This retards 
the convergence of the chain to its equilibrium state, and thus results in poor 
approximations of sums over S-paths. The above phenomenon motivates us 
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to accelerate this naive chain in accordance with an increasing number of 
possible movements among the state space within any step. Noticing that 
each step of the Gibbs sampler is equivalent to re-determinations of the two 
increments m T and m T+ i at locations r and r + 1, respectively, our idea is 
to replace m r+ i by some other m q such that, at any step, it is relatively less 
likely that the resulting chain is bounded to remain unchanged. Suppose 
q > r denotes the next location at which m q = S q — S q -i > 0. 

Algorithm 3.1 (Accelerated path sampler). A Markov chain of S- 
paths (of n + 1 coordinates), which has a unique stationary distribution 
Z(S) = 0(S)/X)s <?KS) with </>(S) given by (8), can be defined by a transition 
cycle of n — 1 steps: 

(i) At step r, suppose S = (0, Si, . . . , S r -i,c, . . . , c, S q , . . . , S n _i, n), where 
S r -i < c < min(r, S q -l).S moves to (0, Si, ... , S r _i, j, . . . ,j,S q , . . . , S„_i,n) 
with conditional probability proportional to 

(14) q 1 r 'i / K Sq - Sr ^ fN p\yHdy), 

b q — l- b r -i Jr q 

if j = S r _i; otherwise, if j € {S r _i + l,S r _i + 2, . . . , min(r, S q — 1)}, with 
probability proportional to 

/S g -S r _l-2\ fr 1 / i-j \ 

\ S q -j-l JJl^i-Sr.J 

(15) 

v ' /-oo poo 

x / Kj-S r -i{e~ fN p\y)ri(dy) x / K Sq -j{e- fN p\z)r]{dz). 

(ii) Repeat step (i) for r = 1, 2, . . . , n — 1 to complete a cycle. 

Starting with an arbitrary path S^°\ and repeating M cycles according 
to the above scheme, gives a Markov chain S^, S^, . . . , S( M ) with a unique 
stationary distribution Z(S). Then, for large M, the ergodic average 

(i6) \M(t) = J Ki( e -^HyM^) + ^EE A ;(*|s (i) ) 

* i=i j=i 

approximates the hazard estimate (11) [44]. 

The validity of the AP sampler is justified by an idea in [20] or [48] (see 
[22] for a proof in more detail in the gamma process case). One could al- 
ways define a sequence of reducible transition kernels p( r \ r = 1,2, . . . ,d, 
that all have the target stationary distribution. Multiplying them in se- 
ries gives a transition kernel P = P^ x p( 2 ) x • ■ • x with the target 
stationary distribution (from construction). If the chain defined by P is 
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irreducible, as all states communicate, the target stationary distribution 
will be unique. At each step r, r = 1, . . . ,n — 1(= d), of the AP sampler, 
the kernel p( r ) is defined by the probability that the chain moves from the 
path after step j — 1, So = (0, Si, ... , S r -i,c, . . . , c, S q , . . . , S n -i,n) to S* = 
(0,Si,...,S r -i,j,...,j,S q ,...,S n -i,n) for j G {S r -i,S r -i + 1,..., 
mm(r,S q — 1)}, which is proportional to </>(S*). That is, So communicates 
only with paths in the collection defined by {S : S = (0, Si, . . . , S r -i,j, . . . , j, 
S g , . . . , S n -i,n),j G {S r -i, S r -i + 1, . . . , min(r, S q — 1)}}. With this construc- 
tion, the kernel decomposes the state space of all S-paths of n + 1 coor- 
dinates into a finite collection of mutually exclusive communication classes 
(see Theorem 3 in [15], page 392). One can easily check that each kernel 
p( r ), though not irreducible, has a stationary distribution Z(S). More im- 
portantly, the chain defined by P = P^ x p( 2 ) x • • • x p( n ~ 1 ) is irreducible, 
as all states can communicate with the path S = (0, 0, . . . , 0, n) within one 
cycle. Hence, the AP sampler gives a Markov chain of S-paths with a unique 
stationary distribution Z(S). 

4. Examples. One can model [i in (1) by a variety of random measures. 
Corresponding posterior analyses follow from the results in the previous sec- 
tions. This section looks at two explicit examples wherein \i is characterized 
by the mean measure, 

(17) p a ^(dz\u)r](du) = — — ^ — -z~ a ~ l exp[-z/(3(u)] dz r)(du). 

1(1 — a) 

This class of random measures generalizes the generalized gamma random 
measure proposed by Brix [8], for < a < 1 and < (3 < oo, or, — oo < a < 
and < f3 < oo. It includes the weighted gamma process (when a = 0), a 
stable law with index < a < 1 (when (3 = oo), and the inverse- Gaussian 
process (when a = 1/2 and f3 > 0) (see [32]). 

4.1. The weighted gamma random measure. If a = in (17), [i is the 
weighted gamma random measure with shape measure rj and scale measure 
(3. Corollary 2.1 gives the Bayes estimate of the decreasing hazard rate (1) 
according to 

(18) K i (e- fN p\u) = (i-l)\x[p(u)- 1 +g N (u)]- i , i = l,2,...,n. 

To apply the AP sampler, one needs to compute conditional probabilities 
(14) and (15) that are proportional to 

(r-^-iJx^-^^lT) 

and 

h f-^Ml x^-Sr^Trmx^iTqlT), 
respectively, where £i(t|T) = St°[P( v )~ 1 + 9N{v)]~ l rj(dv), i = 1, . . . ,n. 
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4.2. The stable law. The stable law with index < a < 1 appears when 
(3 = oo in (17). The posterior mean of (1) defined by this class of random 
measures follows from Corollary 2.1, and it can be evaluated by implement- 
ing the AP sampler, based on 

r(i — a) 



Ki{e- fN p\y) 



r(l-a)\g N ( V )Y-"- 



1,2,..., n. 



5. Numerical results. This section addresses the effectiveness of the AP 
sampler for evaluating the hazard estimate (11). In particular, we select a 
special case of (1) wherein p is a gamma process. The posterior analysis 
follows from discussions in Section 4.1 with /?(•) = 1. Hence, the posterior 
mean of the monotone hazard rate reduces to 



(19) E[A(t|/i)|T] = £i(t|T) + £z*(S)£ 

S j=l 



$ m . + i(max(t,r.,-)|T) 



umiT) 



where Z*(S) oc \\ li: „, , 0} (j - 1 - 5j_i)!/(j - 5,-)! x U 3 (^|T). The com- 
plexity in evaluating £j(t\T) can be reduced by assuming a uniform shape 
probability from to 6(> r) for rj(-), even though the closed-form expression 
is tedious (see [25] for its exact expression). The methodology is tested by 
data from a piecewise constant hazard rate model, for which the hazard rate 
of an item is 



(20) 



X(t) 



1, 

0.5, 



0<t< 1, 
t> 1. 



Data are generated subject to a termination time r = 3, such that the cen- 
soring rate is about 15%. All simulation results that follow are based on a 
Monte Carlo size M = 1000, and an initial path = (0, 1, . . . , n — 1, n). 

Remark 5.1. In practice, the implementation of the AP sampler de- 
pends heavily on evaluations of the double integral 

roc 



ip-9N(u)z 



Z e 



ly Jll 

where, according to (5), 



p(dz\u) r){du), 



y>0,i = l,2,. 



f i-i 



9n(u) 



Y^Ti + (N-j + l)u, !Z)_i <u< Tj,j = 1, ... ,n + 1, 

i=i 

n 

J2T l + (N-n)r, 



U> T. 



< i=l 



It is important to note that the inner integral, which is defined to be 
Ki(e~* N p\u) in (7), is the conditional cumulant of an (exponentially tilted) 
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| N=100;n=82 • ■ - N=500; n=440 — N=1 000; n=886 —true hazard 

-I 1 ! 1 

12 3 

t 



Fig. 1. MCMC hazard estimates produced by the AP sampler. 

infinitely divisible random variable with density of an available form for any 
given p. Then it follows that the integral may be calculated using a result 
of T. N. Thiele, which gives a recursive relation between cumulants and mo- 
ments of a random variable (see, e.g., [18] and [43], Section 2.3). See [33], 
Section 4.1 and [34], Section 5, for more discussion of this problem appearing 
in other contexts. 

5.1. Resolution of the AP sampler. This section focuses on the conver- 
gence property of the hazard estimate (19) approximated by the AP sampler 
as the sample size N increases. Based on nested samples of sizes N = 100, 
500 and 1000, MCMC estimates (16) according to (18) are displayed in Fig- 
ure 1. The graphs echo the fact that the approximated Bayes estimate of 
the decreasing hazard function, Am(^)> tends to the "true" hazard rate (20) 
as sample size increases. We remark that the drop of the hazard estimates 
after t = 3 results from the fact that the estimates are mainly constructed 
based on the prior information, as no complete data is observed after that 
time point. 

5.2. Comparison with other methods. Path-sum estimates of monotone 
hazard rates, though they appeared two decades ago, have not received much 
attention and are not commonly used due to unavailability of efficient nu- 
merical methods; previous attempts by Brunner and Lo [10, 11] and Brunner 
[9] are all far from successful due to their incapability of sampling from de- 
sired posterior distributions of S-paths in the respective models. On the 
contrary, partition-sum counterparts have been used as a substitute [21, 25] 
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Table 1 




A large- 


■sample study of MCMC hazard estimates according to 1000 i 


ndependent 


replications 


of the accelerated path (AP) sampler, of the Gibbs path (gP) 


sampler and of 




the weighted Chinese 


restaurant process (gWCR) sampler 








Average of Standard error of 


t 


MCMC method 


hazard estimates hazard estimates 






at time t 


at time t 




AP 


0.9667340 


0.0038426 


0.5 


gP 


0.9677822 


0.0517161 




gWCR 


0.9668593 


0.0080103 




AP 


0.8815065 


0.0065156 


0.99 


gP 


0.8812967 


0.0594452 




gWCR 


0.8820966 


0.0097827 




AP 


0.8530503 


0.0067767 


1.01 


gP 


0.8524295 


0.0541091 




gWCR 


0.8537771 


0.0099268 




AP 


0.3708132 


0.0055500 


2.0 


g'P 


0.3692810 


0.0440281 




gWCR 


0.3707660 


0.0106327 



since there are many well-developed numerical methods for sampling par- 
titions (see, e.g., [28, 29, 30, 40]). This section aims at comparing MCMC 
hazard estimates produced by the AP sampler, the Gibbs path sampler de- 
fined in [22] and a Gibbs sampler for partitions defined in [40] (see [25] for 
an exact description of the algorithm being applied to this gamma model). 
Standard errors of MCMC hazard estimates by the three different methods 
are estimated by repetitions of experiment, and are used as the standard of 
comparison. 

Here the sample size is fixed at N = 100, and there are re = 82 complete 
observations in our simulated data set. Markov samples from all the three 
Markov chain experiments are collected after a "burn-in" period of 10000 
cycles. We compute 1000 independent hazard estimates by 1000 repetitions 
of each experiment. These are used to estimate the average and the standard 
error of the hazard estimates in the usual manner. The hazard rates X(t) 
at times t = 0.5, 0.99, 1.01 and 2.0 are studied. The points, 0.99 and 1.01, 
are near the change point at 1, and they seem to reflect well the effective- 
ness and the efficiency of the MCMC methods in the worst case. Table 1 
displays the averages and the standard errors of the 1000 realizations of the 
MCMC hazard estimates produced by the three methods. At different time 
points, the three averages are close to each other, yet the standard errors 
vary substantially. The standard error of hazard estimates produced by the 
AP sampler is the smallest among all the three methods. On one hand, the 
AP sampler definitely outweighs the naive Gibbs path sampler. On the other 
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hand, the AP sampler beats the closest competitor, the gWCR sampler, by 
a comfortable margin. These show that our "acceleration" scheme works 
extremely well. 

6. Proportional hazards. The Cox regression model [12] is an important 
example of the multiplicative intensity model that can allow incorporation 
of covariates, together with right independent censoring, in survival analysis. 
For Bayes inference of general hazard rates with the presence of covariates, 
see [27, 30, 31, 35], among others. Assume that the underlying hazard defined 
on 1Z is modeled by 



A(t|Z,0,/i) 



n 



exp(e T Z)I(t <u)n(du) 



where Z is a covariate vector with parameter vector 6, and \o(t\fi) = < 
u)n(du), same as (1), is a decreasing baseline hazard rate. Suppose we col- 
lect data until time r and the data D = ((71, Zi), . . . , (T/v, Z^)) summarize 
completely observed failure times T\ < • • • < T n and right-censored times 
Tj = r, % = n + 1, . . . , JV, associated with covariate vectors Zj, i = 1, . . . , N, 
with unknown parameter vector 6. Define fN,e( z , u ) = 9N,e( u ) z > where 



(21) 



9N,e(u) 



N 



^I(r 4 >t)exp(6> T Z 4 ) 



I(t < u) dt. 



Then the Cox proportional hazards likelihood may be written as 



(22) 



]exp(6> T Z i )A (r i |//) 



exp[-n(g Nt e)], 



where fi(gN,e) = J n gN,e(u)n(du) = J T Ej=iI(Ti > i) ex P( Zi)]\ (t\iJ,)dt. 
Assume J n z l e~ 9N ' e ^ z p(dz\u) < oo, for i = 1, . . . , n and a fixed u > 0. 

Proposition 6.1. Suppose the likelihood of the data is given by (22). 
Let w(d6) denote our prior for 9 and independently assume p is a com- 
pletely random measure characterized by the Laplace functional (3). Then 
the posterior distribution of n\6,T) can be described as a three-step hierar- 
chical experiment as in Theorem 2.1, of which /jv( - , •) and <7jv( - ) are replaced 
by /jv,e(v) andg Nt e(-), respectively. 



To evaluate any posterior quantities of model (22), such as the posterior 
mean of the underlying monotone baseline hazard rate and the posterior 
mean of the covariate parameters 0, run the following Gibbs sampler: 

1. Draw S Q,y, 0,D by implementing Algorithm 3.1 with /jv(-, ■) and <?at(-) 
replaced by //v,e(v) and gN,o('), respectively. 
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2. Draw Q,y|S,0,D according to the analogues of the conditional distri- 
butions (9) and (10) in Theorem 2.1 with /jv(t) and <7at(-) replaced by 
fN,e(;-) and g N ,e{-), respectively. 

3. Draw Q,y, S,D from the density proportional to 

7r(dO)B(0) Y[ e -flw,8(w)Qi ) 
{j:mj>0} 

where 



(23) B(0)=exp 



[ [ (l-e- 9N ' e ^ z )p(dz\u)r](du)\ f[exp(0 T ZA 
Jn Jtz J 



Remark 6.1. Note that gN,e( u ) is again a piecewise linear function of 
u as gN{u) in the case without covariates (discussed in Remark 5.1). This 
does not create any further complexities in evaluating integrals at steps 1 
and 2 of the Gibbs sampler. Step 3, which is of the same form as the step 
4 (for conditional draws of regression parameters 6) of the blocked Gibbs 
algorithm suggested by Ishwaran and James ([30], page 184), can be dealt 
with via a Metropolis step. 

Remark 6.2. We conclude here that S-paths may be derived from ev- 
ery exchangeable random partition p by summation. That is to say, the 
general correspondence between these structures is simply a combinatorial 
relationship. However, what we are exploiting is the fact that the mono- 
tone hazard rates models, due to the presence of the indicator kernel, are 
naturally representable in terms of tractable S-path structures. Thus, tech- 
nically our approach may be applied to models exhibiting similar structure 
(see, e.g., [10, 11, 23, 24]). With this point in mind, we note that recently 
James and Lau [33] show that the non-Gaussian Ornstein-Uhlenbeck models 
of Barndorff-Nielsen and Shephard [5] , which are also special types of Levy 
moving averages, also exhibit S-path structures which are amenable to our 
approach. It is important to note that those models are not of the form in 

(1). 

APPENDIX 



Proof of Theorem 2.1 and Lemma 2.1. The proof relies on the 
following two key structures: 

(*) For any p s= C s , 

n (p) , \ 
(24) Il(maxr i <u i )= 1(2} < %•), 

< I V ' ' {j:m,>0} 
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where v = (v%, . . . , f n (p)) represents the unique values among (m, . . . , u n ) 
in (6) and y = {yj : rrij > 0, j = 1, . . . , n} is a permutation of v according 
to p € C s . 

(★*) The total number of partitions that correspond to a given S equals 



(25) |C S |:=E 1= II 

PGC S {j: mj >0} 



j-l-Sj- 
3-Sj 



where X) P eC s represents summing over all partitions that correspond to 
S. 

According to James [32], the posterior law of p\T is equivalent to the distri- 
bution of a random measure p* N + J2i=i Ji^Vi ■ It is determined by the joint 
distribution of fi* N , J, v, p|T, which is proportional to 

n(p) n(p) 

(26) ¥(dfi* N ) J] Ji^e-^^pidJilvi) J] I[ maxTj < vA^dvi), 

i=i i=i VjeCl ' 

where J = (J±, . . . , J n ( p )), and F(dfi* N ) is a completely random measure char- 
acterized by an intensity measure e -5 "" 2 p(dz\u)rj(du) as in (iii) of Theorem 
2.1. 

Notice that (26), due to its irrelevance to the remaining members other 
than the maximal elements of the cells in a partition, may be rewritten in 
terms of the intrinsic characteristics of a path S , provided that p G Cs based 
on (*) as 

(27) P(d/4) J] Q^e-^y^p(dQ,\ yj ) J] l(Tj< yj )r,(d yj ), 

{j:mj>0} {j:m 3 >0} 

where Q = {Qj : rrij > 0, j = 1, . . . ,n}, is a relabeling of J according to the 
correspondence p G Cs- That is, the conditional law of p* N , J,v,p|T only 



Table 2 

Total numbers of S -paths and partitions versus sample 
sizes (n) 

n # of paths # of partitions Ratio in % 



1 


1 


1 


100.000 


3 


5 


5 


100.000 


5 


42 


52 


80.769 


7 


429 


877 


48.917 


10 


16,796 


115,975 


14.482 


15 


9,694,845 


1,382,958,545 


0.701 


20 


6,564,120,420 


51,724,158,235,372 


0.013 
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depends on p through S. Equation (27) and the equivalence in distribution 
relation between the two random measures, 



(28) 



( ™(p) ^1 r 

i=l {j:mj>0} 



imply that the posterior law of /i|T can be expressed in terms of p* N \Q, y, S, T 
and Q, y, S|T. Then, integrating out p* N and summing over all p £ Cs in (27) 
yields that the distribution of Q,y,S|T is proportional to 

(29) [] Qr^~ 9NiVj)Qj P(dQM II KTj<Vj)ri{dVi) E 1 

{j: mj >0} {j:mj>0} Lp€C s 

where X) P gc s 1 i s given in (**). Now, the laws of Q|S,y,T, y|S,T and S|T 
follow from Bayes' theorem and multiplication rule. Hence, the result in The- 
orem 2.1 follows, while the conditional distribution of p|S,T in Lemma 2.1 
follows by dividing (26) without the leading term P(d[i* N ) by (29). □ 

Proof of Proposition 6.1. Following the same arguments as in [32] 
in getting the posterior distribution given by (26) yields that the posterior 
law of is equivalent to the distribution of a random measure ijl* n + 

Y^=t Ji$Vi- It is determined by the joint distribution of p* N , J, v, p, #|D, 
which is proportional to 

n(p) 

W{dif N )ir{dO)B{0) J] J^e-^^pidJilvi) 

(30) ~ 

«(P) , X 

X TT II maxTj < V{ )r](dvi), 
t=i VjeCl ' 

where ¥(dp* N ) is a completely random measure characterized by an intensity 
measure e~ 9N > ( u ^ z p(dz\u)r](du), and B(B) is given by (23). As in the proof 
of Theorem 2.1, p* N \Q, y, S, 6, D is equivalent to F(dp* N ). Then, summing 
over all p € Cs in (30) yields that the conditional distribution of Q, y, S, 0|D 
is proportional to 

Tr(d9)B(0) [] Q^ie-wMQi p(dQj\yj) 

{j:m.j>0} 

(31) 

x [] I(Tj <y j Hdy j ) x |C S |, 
{j-. mj >o} 

due to (*) and (**). Hence, by Bayes' theorem and multiplication rule, 
the result follows from the conditional distribution of Q,y, S|#,D, which is 
proportional to (31) without the leading term Tr{d9)B{6). □ 
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